Clustering of consecutive numbers in permutations under Mallows distributions and super-clustering under general p-shifted distributions
نویسندگان
چکیده
Let Al;k(n)?Sn denote the set of permutations [n] for which l consecutive numbers {k,k+1,?,k+l?1} appears in a positions. Under uniform probability measure Pn on Sn, one has Pn(Al;k(n))?l! nl?1 as n??. In part this paper we consider clustering under Mallows distributions Pnq, q>0. Because duality, it suffices to q?(0,1). We show that qn=1?c n?, with c>0 and ??(0,1), Pnq(Al;kn(n)) is order 1 n?(l?1), uniformly over all sequences {kn}n=1?. Thus, letting Nl(n)= ?k=1n?l+11Al;k(n) number sets appearing positions, have limn??EnqnNl(n)= ?,ifl<1+? ?;0,ifl>1+? ?.. also cases ?=1 ?>1. other general p-shifted distributions, Pn{pj}j=1?, distribution particular case. calculate explicitly quantity liml??lim infn??Pn{pj}j=1?(Al;kn(n))=lim l??lim supn??Pn{pj}j=1?(Al;kn(n)) terms p-distribution. When positive, say super-clustering occurs. particular, occurs fixed parameter q?1.
منابع مشابه
Classification and properties of acyclic discrete phase-type distributions based on geometric and shifted geometric distributions
Acyclic phase-type distributions form a versatile model, serving as approximations to many probability distributions in various circumstances. They exhibit special properties and characteristics that usually make their applications attractive. Compared to acyclic continuous phase-type (ACPH) distributions, acyclic discrete phase-type (ADPH) distributions and their subclasses (ADPH family) have ...
متن کاملClustering Multivariate Normal Distributions
In this paper, we consider the task of clustering multivariate normal distributions with respect to the relative entropy into a prescribed number, k, of clusters using a generalization of Lloyd’s k-means algorithm [1]. We revisit this information-theoretic clustering problem under the auspices of mixed-type Bregman divergences, and show that the approach of Davis and Dhillon [2] (NIPS*06) can a...
متن کاملthe clustering and classification data mining techniques in insurance fraud detection:the case of iranian car insurance
با توجه به گسترش روز افزون تقلب در حوزه بیمه به خصوص در بخش بیمه اتومبیل و تبعات منفی آن برای شرکت های بیمه، به کارگیری روش های مناسب و کارآمد به منظور شناسایی و کشف تقلب در این حوزه امری ضروری است. درک الگوی موجود در داده های مربوط به مطالبات گزارش شده گذشته می تواند در کشف واقعی یا غیرواقعی بودن ادعای خسارت، مفید باشد. یکی از متداول ترین و پرکاربردترین راه های کشف الگوی داده ها استفاده از ر...
Parallel D2-Clustering: Large-Scale Clustering of Discrete Distributions
The discrete distribution clustering algorithm, namely D2-clustering, has demonstrated its usefulness in image classification and annotation where each object is represented by a bag of weighed vectors. The high computational complexity of the algorithm, however, limits its applications to large-scale problems. We present a parallel D2-clustering algorithm with substantially improved scalabilit...
متن کاملSingle-machine scheduling with general costs under compound-type distributions
We investigate in this paper the problem of scheduling n jobs on a single machine with general stochastic cost functions, which involves the effects of stochastic due times, and with the stochastic processing times in a class of distributions, which covers the family of exponential distributions, the family of geometrical distributions and some other important distribution families as its speci...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Electronic Journal of Probability
سال: 2022
ISSN: ['1083-6489']
DOI: https://doi.org/10.1214/22-ejp812